PPDM: Output Privacy

...other than differential privacy, including: statistical disclosure control, query restriction/auditting, etc.

SoK

Security-Control Methods for Statistical Databases: A Comparative Study (ACM Computing Surveys, 1989)

Nabil R. Adam, John C. Worthmann

cited by ~1300

nrryuya.icon > Good summary and comparison of naive, intuitive approaches

https://gyazo.com/3b97ab8c4615c07e415ee41a70208d79

Query restriction approach

Query-Set-Size Control -> too easy to compromise

Query-Set-Overlap Control -> not practical

Query Auditting -> high CPU time and storage requirements (at the time)

Partitioning (similar to k-anonymization), cell suppression

Data perturbation -> suffer from a bias problem

Probablity-distribution category -> partial disclosure is easy for large SDBs

Fixed-data perturbation category -> limited to one confidential attribute.

Output perturbation

Random-Sample Queries

Varying-Output Perturbation

Rounding (Systematic/Random/Controlled) -> not effective (but can be combined with other methods)

Microdata Protection (AIS'07)

V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati

cited by ~100

From "Privacy-Preserving Data Mining: Models and Algorithms" (2008)

Picking up sections about output privacy.

Chapter 5: k-Anonymous Data Mining: A Survey

V. Ciriani, S. De Capitani di Vimercati, S. Foresti, P. Samarati

Section 5.7 Mine-and-Anonymize

Enforcing k-Anonymity on Association Rules/Decision Trees

Example of an inference channel violating k-anonymity and the solution by modifying the support of involved itemsets

https://gyazo.com/647cba10676f4d0f57c53d9ed21cca2d

Citing: Anonymity preserving pattern discovery (Atzori et, al, 2006, VLDB)

Chapter 8: A Survey of Quantification of Privacy Preserving Data Mining Algorithms (pdf)

Elisa Bertino, Dan Lin, Wei Jiang

Section 8.2.2 Result Privacy

Cites "When do data mining results violate privacy?" (ACM SIGKDD'04)

Inference channel in Bayesian classifiers

Section 8.4.2 Quality of the Data Mining Results

Introduces the metrics for classification, association rules

Chapter 9: A Survey of Utility-based Privacy-Preserving Data Transformation Methods

Ming Hua, Jian Pei

https://gyazo.com/f9980f0c0e4508317609bc2838279eae

Section 9.3 Utility-Based Anonymization Using Local Recoding

Improving the query answering accuracy on anonymized tables

Utility measure: normalized certainty penalty (NCP) i.e., normalized interval size after generalization

nrryuya.icon > not tailored to the actual queries

Algorithm: Bottom-up/top-down k-anonymization

Section 9.4 The Utility-based Privacy Preserving Methods in Classification Problems

The top-down specialization (TDS): generalize tuples in a table, such that tuples in the same equivalence (i.e., sharing the same values on QI) class are as pure as possible with respect to class labels.

the information gain is measured by the entropy reduction on the affected equivalence classes

The progressive disclosure algorithm (PDA): to eliminate sensitive inferences, it suppresses some attribute values so that the confidence of each inference rule is controlled lower than a user defined threshold.

the information gain is defined as the entropy reduction on the tuples involving the disclosed value

Section 9.5 Anonymized Marginal: Injecting Utility into Anonymized Data Sets

A marginal (i.e., SELECT count(*) from t GROUP BY A) is anonymized if some of its attribute values are generalized.

A utility measure is defined as the difference between the distribution of the original data and that of the anonymized data (KL-divergence).

Citing: Injecting Utility into Anonymized Datasets (ACM SIGMOD'06)

Chapter 10: Mining Association Rules under Privacy Constraints

Jayant R. Haritsa

Privacy Metric: Average Privacy, Worst-case Privacy, Re-interrogated Privacy, Amplification Privacy

Accuracy Metric: Support Error, Identity Error

Literature 1: Input data privacy (MASK, Cut-and-Paste Operator, Algebraic-distortion)

FRAPP: a generalized matrix-theoretic framework for the design of random perturbation

Literature 2: Output rule privacy (support/confidence-based hiding, data blocking)